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The Chernoff Bound as Concentration of Measure 

We have already seen some ways in which convex bodies are related to probability. For example, we can 
think of the Chernoff bound as the statement that for any unit vector a and real t, if x is chosen uniformly 
at random from the cube then 

Pr[|a- x\ >t]< 2e- 6 * 2 . 

Since \a ■ x\ is the distance of x from the hyperplane orthagonal to a, this says that all but 2e -6 * of the 
volume of the cube lies at distance at most t from this hyperplane. Since a was arbitrary, we can conclude 
that 1 — 2e -6t of the volume of the cube lies within t of any hyperplane through the origin. 

On the sphere we also observed that almost all of the volume lies very close to any hyperplane through the 
origin. In light of the probabalistic implications of this assertion for the cube we are motivated to consider 
them for the sphere. First we will need to derive a stronger statement for the sphere. 

The Isoperimetric Inequality on the Sphere 

We will consider the analogue of the isoperimetric question for subsets of the surface of the sphere. This 
requires analogues of the notions of distance, volume, and surface area. 

We define the distance d(x,y) between points x,y 6 S 1 ™ -1 to be their distance in the usual Euclidean 
metric in R n . 

For volumes, we use the unique rotationally invariant measure on the surface of the sphere. The volume 
Vol(A) of a region A on the surface of the sphere is the volume of the union in R n of all segments connecting 
the origin to a point of A, normalized so that the volume of the whole sphere is 1. Alternatively, this is Haar 
measure when the sphere is given the natural Lie group structure. (You can do anything reasonable and get 
the same measure.) 

For surface areas, we use the same definition as in K n . Namely, for a set A C S n ~ l , define A e to be the 
set of points in S n ~ 1 at a distance of less than e from some point of A. The surface area of A may be defined 
as d e A e . We won't work with this quantity-instead we will derive bounds on Vol(A e ) itself for e > 0. 

Now the isoperimetric question is: among sets with a fixed Vol(4), what is the minimal possible value of 
Vol(A e )? 

The answer is the analogue of a ball: a spherical cap. More precisely, define 

C(r,v) = {x 6 S"' 1 : d(x,v) < r} . 

This is the ball of radius r centered at v in the metric we have defined on the sphere. This result 
is precisely analogous to the isoperimetric inequality in R n . (The statement itself will be slightly more 
complicated because the optimal ratio A e /A depends on the volume of A: a small cap is basically a ball in 
R™ -1 , while a very large cap has a very small surface area) 

For convenience, we will also define the "cap at height t" : 

Ct = c (t, v) = {x e S 1 "" 1 : x ■ v > t} . 

We have seen previously that the volume of a section of the sphere at height t is exponentiall small. From 
this it follows that the volume of c(t, v) is exponentially small in t. In fact, Vol(c(t, v)) « e~ nt I 2 . 
We will prove an approximation to this result soon, but first we consider some consequences. 
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Theorem 1 For any A with Vol(A) = 1/2, Vol(A e ) > 1 - e~ ne ^ 2 . 

Proof If A were a spherical cap, then A f would be the complement of the spherical cap at height e, which 
has volume 1 — e~ ne I 2 . But by the isoperimetric inequality this is the minimum possible value of Vol(^4 e ). 



This theorem shows that for spheres in high enough dimension almost all of the volume of the sphere lies 
within e of any set containing at least half the volume of the sphere. In fact almost all of the volume of the 
sphere lies within e of any set containing any constant fraction of the volume of the sphere (although the 
constants in the theorem would change). 

We will now go on to use this result to conclude that Lipschitz functions are almost always close to their 
median. 

Lipschitz Functions and Concentration of Measure 

Definition 2 (1-Lipschitz) A junction f : S"- 1 ^. is 1-Lipschitz if\f(a)-f(b)\ < \a-b\ for alia, be S™ _1 . 

It turns out that many reasonable functions are Lipschitz. For example, distance from a fixed set is 
Lipschitz. 

Define a median M of a Lipschitz function to be a value M such that Vol({a; : f(x) < M}) > Vol({a: : f(x) > 
1/2. 

If we take / were one of the coordinate functions (which are Lipschitz), then the statement that most of 
the volume of a sphere lies near any hyperplane through the origin becomes the statement that the value of 
/ is almost always near its median. We will see that in fact all Lipschitz functions are almost always near 
their median. 

Theorem 3 If f is Lipschitz, M is its median, and e > 0, then 

Vol({x : \f(x) - M\ > e}) < 2e"" e2/2 . 

Proof The set A — f(x) < M has volume at least 1/2. The set f(x) < M + e contains A € . Therefore 
by the isoperimetric inequality, f(x) < M + e holds for at least 1 — e~ ne I" 1 of the volume of the sphere. 
Similarly, f(x) > M — e holds for 1 — e~ ne I 2 of the volume of the sphere. Therefore in total \f(x) — M\ > e 
for at most 2e~ ne I 2 of the volume of the sphere (since at every point where this inequality holds at least 
one of the previous two must fail). ■ 

Although the range of a 1-Lipschitz function may have diameter 2, this result shows that 1-Lipschitz 
functions are almost constant over most of their domain. We call this result "concentration of measure." 

Note that this result doesn't rely on the exact form of the isoperimetric inequality; it would be fine if the 
bound on the ratio Vol(A e )/Vol(^4) was somewhat weaker. 



The Isoperimetric Inequality 

We will prove a weaker statement than the full isoperimetric inequality because it is somewhat easier. 
Normally we would have to use a symmetrization argument, but after weakening the constantants we will 
be able to apply Brunn-Minkowski. 

Theorem 4 For any A C S n ~ 1 and any e > 

2e -ne 2 /16 

Vol ^ >1 -^4T- 
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Proof 

We will need the following definition. 



Definition 5 (Modulus of Convexity) The modulus of convexity 8 for a sphere is 

8(e) = inf|l- |^| :x,yeS n -\\x-y\>ey 
It is a matter of two dimensional geometry to compute 



*(e) = l-yi- T >e78. 
(where the inequality comes from the Taylor series). 

This quantity measures how much more curved the sphere is than required by convexity. Namely, by 
convexity we are guaranteed that 8(e) < 1 (which we would obtain in the L 1 or norm). If 8(e) is smaller, 
it means that longer segments lie well inside the convex body. 

We would like to apply Brunn-Minkowski, but we don't have any result of that sort for the surface of the 
sphere. We will pass to a spherical shell, for which we can apply Brunn-Minkowski. Namely, if A C S 1 ™ -1 
consider B = [\, 1]A- the union of the sets xA for \ < x < 1. Note that Vol(B) > Vol(A)/2, where the 
volume of B is taken in R n_1 normalized so that B n has volume 1 and the volume of A is taken in S 1 " -1 
normzlied so that S 1 ™ -1 has volume 1. The choice of 1/2 in particular is not important. All that matters 
is that neighborhoods of [\, l]A centrally project to reasonable neighborhoods of 5 n_1 ; if we took (0, 1]A, 
neighborhoods near the origin could project to almost all of S n ~ . 

To go from a set B C B n to an A C S" 1 " 1 we take j jfy : x € Note that if we define B = [§, 1] A 

take B e , and then convert this back to a subset of 5 n_1 , we do not necessarily obtain A e . A point within e 
of \A may project back to a point on S 1 ™ -1 as far as 2e from A. In fact this is the worst that can happen, 
so that B e is carried back into A 2e - We would like to say that the volume of B e n B n is at least the volume 
of A 2e , so that we can convert a bound on the size of B e from Brunn-Minkowski into a bound on the size of 
A 2e D A e . This isn't quite true-_B e may contain points of norm < 1/2. However, all points in B f have norm 
at least 1/2 — e, so it turns out this does not have a significant effect (Vol ([^ — e, \}A) is very small). 
We will show that Vol(B e n B n ) > 1 — e~ 2nS /Vol(B). This will give us the desired result, since then 

-2n&(2e) ti£ 2 /2 

Vol(,42 E ) > (1 + e)Vol(B e n B n ) > 1 — - — > 1 " 



Vol(B) - Vo\(A) 
which is what we wanted. 

To bound the volume of B e n B n , let C be the set of points of B n at least e away from every point of B. 
For any x 6 B and any y 6 C, by the definition of modulus of convexity ^-±M < \ _ <5( e ) (the worst case 
is that both lie in S n ~ l ). This implies that B ® C <Z (I - 8)B n , so that Vol(5 © C) 1 /™ < (1 - 8). Now by 
Brunn-Minkowski, 

(1 - 8) > Vo\(B © C) 1/n > Vol(B) 1 /" + Vol(C7) 1/n . 
By easy calculus or the power-mean inequality, and the inequality e~ x > 1 — x, we conclude 

Vol(B)^ 2 Vol(C) 1/2 < (1 - S) n 
Vol(C) < (1 - 5) 2 ra/Vol(B) < e- 2n5 /Vo\(B) 

Taking complements in B n , 

e -2nS 



Vol(B e nB™) = l-Vol(C7)>l V()|(/ , ) 



as desired. 
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Johnson-Lindenstrauss 



Johnson-Lindenstrauss can be proved by manipulating Gaussians, but it is quite easy with concentration of 
measure. For now we will just give the setup and outline some applications. 

This is the first example we have seen of the notion of metric embeddings, which turn out to be generally 
algorithmically useful. Given a metric d on a finite set of points X, we would like to find a map / : X — > R ra 
such that d(x,y) « d(f(x),f(y)) for the normal Euclidean metric d on R™. More precisely, for any map 
f : X — > R™ we define the distortion D to be the ratio between the largest and smallest values of ,,, t d M' v 2 « 

J b d,(f{x),f(y)) 

as x and y vary. We would like to find an embedding with 1 + e distortion. 

The Johnson-Lindenstrauss Theorem states that if the metric on X arises from an embedding of X into 
any Euclidean space, then X can be embedded with distortion at most 1 + e in R k for k = 0(e 2 log 
More concretely, this embedding is given by projection onto a random fc-dimensional subspace, and the ratio 
d(x,y)/d(f(x),f(y)) is very nearly O(^kjn). 




This result is extremely useful in a number of situations. If I wish to answer some question about a 
fixed set of points which depends only on their pairwise distances, then Johnson-Lindenstrauss allows us to 
reduce the problem to one in logarithmic dimension (for fixed e) by randomly projecting. If our algorithm 
has bad dependence on the dimension, this may reduce the runtime considerably (for example, exponential 
dependence becomes polynomial). Similarly, if I am dealing with a stream of very high-dimensional data and 
I do not have storage space to record it all, Johnson-Lindenstrauss allows us to retain a very small fraction 
of this data while preserving the answer to any question which depends only on distances. 

We will prove this result in the next lecture. 
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